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Linear Regression 


8.5.0 Linear Regression 

Sometimes we are interested in obtaining a simple model that explains the relationship 
between two or more variables. For example, suppose that we are interested in studying the 
relationship between the income of parents and the income of their children in a certain 
country. In general, many factors can impact the income of a person. Nevertheless, we suspect 
that children from wealthier families generally tend to become wealthier when they grow up. 
Here, we can consider two variables: 

1. The family income can be defined as the average income of parents at a certain period. 

2. The child income can be defined as his/her average income at a certain period (e.g, age). 

To examine the relationship between the two variables, we collect some data 

(xi,yi), for i = 1,2, • • • ,n, 

where y t is the average income of the 2th child, and X t is the average income of his/her 
parents. We are often interested in finding a simple model. A linear model is probably the 
simplest model that we can define, where we write 

yi ~ A + A®*- 

Of course, there are other factors that impact each child's future income, so we might write 

Vi — A) + Pl x i + e i 5 

where is modeled as a random variable. More specifically, if we approximate a child's 
future income by 2/i = A) + A X{, then € t indicates the error in our approximation. The 
goal here is to obtain the best values of A) and At that result in the smallest errors. In other 
words, we would like to draw a "line" in the X — y plane that best fits our data points. The 
line 


y — A + A * 

is called the regression line. Figure 8.11 shows the regression line. 
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Figure 8.11- Regression line is the line that best represents the data points 

(xi,yi). 


We may summarize our model as 


Y — fa + (3\X + e. 

Note that since 6 is a random variable, Y is also a random variable. The variable X is called 
the predictor or the explanatory variable, and the random variable Y is called the response 
variable. That is, here, we use X to predict/estimate Y. 
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